尽管基于3D点云表示的基于自我监督的对比度学习模型最近取得了成功,但此类预训练模型的对抗性鲁棒性引起了人们的关注。对抗性对比学习(ACL)被认为是改善预训练模型的鲁棒性的有效方法。相比之下,投影仪被认为是在对比度预处理过程中删除不必要的特征信息的有效组成部分,并且大多数ACL作品还使用对比度损失,与预测的功能表示形式相比损失,在预处理中产生对抗性示例,而“未转移”的功能表征用于发电的对抗性输入。在推理期间。由于投影和“未投影”功能之间的分布差距,其模型受到限制,以获取下游任务的可靠特征表示。我们介绍了一种新方法,通过利用虚拟对抗性损失在对比度学习框架中使用“未重新注射”功能表示,以生成高质量的3D对抗示例,以进行对抗训练。我们介绍了强大的意识损失功能,以对抗自我监督对比度学习框架。此外,我们发现选择具有正常操作员(DON)操作员差异的高差异作为对抗性自学对比度学习的附加输入,可以显着提高预训练模型的对抗性鲁棒性。我们在下游任务上验证我们的方法,包括3D分类和使用多个数据集的3D分割。它在最先进的对抗性学习方法上获得了可比的鲁棒精度。
translated by 谷歌翻译
本文解决了3D对象重建的未校准光度立体声的任务,其中对象形状,对象反射率和照明方向均未知。这是一项极其困难的任务,挑战与光度法立体声中众所周知的普遍浮雕(GBR)歧义的存在进一步更加复杂。解决这种歧义的先前方法要么依赖于过度简化的反射模型,要么假设特殊的光分布。我们提出了一种新方法,该方法在一般表面和灯光假设下共同优化对象形状,光方向和光强度。镜面可显式地通过神经反向渲染过程求解未校准的光度立体声。我们使用新型的进行性镜面底座逐渐拟合从闪亮到粗糙的镜面。我们的方法通过最大程度地减少对每个对象基础的重建误差来利用基于物理的渲染方程。我们的方法证明了在现实世界数据集上的光估计和形状恢复中的最新精度。
translated by 谷歌翻译
现有的基于密钥帧的运动合成主要集中于循环动作或短期运动的产生,例如步行,跑步和近距离姿势之间的过渡。但是,这些方法将在处理复杂和即兴运动时,例如舞蹈表演和武术时会大大降低合成运动的自然性和多样性。此外,当前的研究缺乏对生成的运动的细粒度控制,这对于智能的人类计算机互动和动画创作至关重要。在本文中,我们提出了一个基于多个约束的新型基于关键的运动生成网络,该网络可以通过学习的知识来实现​​多样化的舞蹈综合。具体而言,该算法主要基于复发性神经网络(RNN)和变压器体系结构制定。我们网络的骨干是由两个长期记忆(LSTM)单元组成的层次RNN模块,其中第一个LSTM用于将历史框架的姿势信息嵌入潜在空间中,第二个LSTM用于使用第二个LSTM,并且使用了第二个LSTM。预测下一帧的人类姿势。此外,我们的框架包含两个基于变压器的控制器,这些控制器分别用于建模根轨迹和速度因子的约束,以更好地利用框架的时间上下文并实现细粒度的运动控制。我们在包含各种现代舞蹈的舞蹈数据集上验证了拟议的方法。三个定量分析的结果验证了我们算法的优势。视频和定性实验结果表明,我们算法产生的复杂运动序列即使是长期合成,也可以在关键帧之间实现多种和平滑的运动过渡。
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
Supervised Question Answering systems (QA systems) rely on domain-specific human-labeled data for training. Unsupervised QA systems generate their own question-answer training pairs, typically using secondary knowledge sources to achieve this outcome. Our approach (called PIE-QG) uses Open Information Extraction (OpenIE) to generate synthetic training questions from paraphrased passages and uses the question-answer pairs as training data for a language model for a state-of-the-art QA system based on BERT. Triples in the form of <subject, predicate, object> are extracted from each passage, and questions are formed with subjects (or objects) and predicates while objects (or subjects) are considered as answers. Experimenting on five extractive QA datasets demonstrates that our technique achieves on-par performance with existing state-of-the-art QA systems with the benefit of being trained on an order of magnitude fewer documents and without any recourse to external reference data sources.
translated by 谷歌翻译
Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.
translated by 谷歌翻译
Knowledge graph embedding (KGE), which maps entities and relations in a knowledge graph into continuous vector spaces, has achieved great success in predicting missing links in knowledge graphs. However, knowledge graphs often contain incomplete triples that are difficult to inductively infer by KGEs. To address this challenge, we resort to analogical inference and propose a novel and general self-supervised framework AnKGE to enhance KGE models with analogical inference capability. We propose an analogical object retriever that retrieves appropriate analogical objects from entity-level, relation-level, and triple-level. And in AnKGE, we train an analogy function for each level of analogical inference with the original element embedding from a well-trained KGE model as input, which outputs the analogical object embedding. In order to combine inductive inference capability from the original KGE model and analogical inference capability enhanced by AnKGE, we interpolate the analogy score with the base model score and introduce the adaptive weights in the score function for prediction. Through extensive experiments on FB15k-237 and WN18RR datasets, we show that AnKGE achieves competitive results on link prediction task and well performs analogical inference.
translated by 谷歌翻译
Digital engineering transformation is a crucial process for the engineering paradigm shifts in the fourth industrial revolution (4IR), and artificial intelligence (AI) is a critical enabling technology in digital engineering transformation. This article discusses the following research questions: What are the fundamental changes in the 4IR? More specifically, what are the fundamental changes in engineering? What is digital engineering? What are the main uncertainties there? What is trustworthy AI? Why is it important today? What are emerging engineering paradigm shifts in the 4IR? What is the relationship between the data-intensive paradigm and digital engineering transformation? What should we do for digitalization? From investigating the pattern of industrial revolutions, this article argues that ubiquitous machine intelligence (uMI) is the defining power brought by the 4IR. Digitalization is a condition to leverage ubiquitous machine intelligence. Digital engineering transformation towards Industry 4.0 has three essential building blocks: digitalization of engineering, leveraging ubiquitous machine intelligence, and building digital trust and security. The engineering design community at large is facing an excellent opportunity to bring the new capabilities of ubiquitous machine intelligence and trustworthy AI principles, as well as digital trust, together in various engineering systems design to ensure the trustworthiness of systems in Industry 4.0.
translated by 谷歌翻译
Surgical robot automation has attracted increasing research interest over the past decade, expecting its huge potential to benefit surgeons, nurses and patients. Recently, the learning paradigm of embodied AI has demonstrated promising ability to learn good control policies for various complex tasks, where embodied AI simulators play an essential role to facilitate relevant researchers. However, existing open-sourced simulators for surgical robot are still not sufficiently supporting human interactions through physical input devices, which further limits effective investigations on how human demonstrations would affect policy learning. In this paper, we study human-in-the-loop embodied intelligence with a new interactive simulation platform for surgical robot learning. Specifically, we establish our platform based on our previously released SurRoL simulator with several new features co-developed to allow high-quality human interaction via an input device. With these, we further propose to collect human demonstrations and imitate the action patterns to achieve more effective policy learning. We showcase the improvement of our simulation environment with the designed new features and tasks, and validate state-of-the-art reinforcement learning algorithms using the interactive environment. Promising results are obtained, with which we hope to pave the way for future research on surgical embodied intelligence. Our platform is released and will be continuously updated in the website: https://med-air.github.io/SurRoL/
translated by 谷歌翻译
Learning the underlying distribution of molecular graphs and generating high-fidelity samples is a fundamental research problem in drug discovery and material science. However, accurately modeling distribution and rapidly generating novel molecular graphs remain crucial and challenging goals. To accomplish these goals, we propose a novel Conditional Diffusion model based on discrete Graph Structures (CDGS) for molecular graph generation. Specifically, we construct a forward graph diffusion process on both graph structures and inherent features through stochastic differential equations (SDE) and derive discrete graph structures as the condition for reverse generative processes. We present a specialized hybrid graph noise prediction model that extracts the global context and the local node-edge dependency from intermediate graph states. We further utilize ordinary differential equation (ODE) solvers for efficient graph sampling, based on the semi-linear structure of the probability flow ODE. Experiments on diverse datasets validate the effectiveness of our framework. Particularly, the proposed method still generates high-quality molecular graphs in a limited number of steps.
translated by 谷歌翻译